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Zeeman mapping of the magnetic field over 
extended regions (30). 
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Globally, human immunodeficiency virus-type 1 (HIV-1) is extraordinarily variable, 
and this diversity poses a major obstacle to AIDS vaccine development Currently, 
candidate vaccines are derived from isolates, with the hope that they will be 
sufficiently cross-reactive to protect against circulating viruses. This may be overly 
optimistic, however, given that HIV-1 envelope proteins can differ in more than 30% 
of their amino acids. To contend with the diversity, country-specific vaccines are being 
considered, but evolutionary relationships may be more useful than regional consid- 
erations; Consensus or ancestor sequences could be used in vaccine design to 
minimize the genetic differences between vaccine strains and contemporary isolates, 
effectively reducing the extent of diversity by half. 



Since HIV-1 M group began its expansion in 
humans roughly 70 years ago (7, 2) it has 
diversified rapidly (J), now comprising a num- 
ber of different subtypes and circulating recom- 
binant forms (CRFs). The HIV-1 M group is 
the set of diverse viruses that dominates the 
global AIDS epidemic. Subtypes are genetical- 



ly defined lineages that can be resolved through 
phylogenetic analysis of the HIV-1 M group as 
well-defined clades, or branches, in a tree. Re- 
combination occurs frequently, and a CRF car- 
ries sections of two or more subtypes in a 
mosaic genome; a recombinant lineage is des- 
ignated a CRF when related forms are found in 



multiple epidemiologically unlinked individuals. 
Currently, strains belonging to the same subtype 
can differ by up to 20% in their envelope proteins, 
and between-subtype distances can soar to 35%. 
Moreover this diversity is continually growing. 
The need for frequent changes in the annual in- 
fluenza vaccine puts into perspective the implica- 
tions of such diversity— less than 2% amino acid 
change can cause a failure in the cross-reactivity 
of the polyclonal response to the influenza vaccine 
and necessitates changing the vaccine strain (4). 
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Although the scale of the HIV-1 pan- 
demic makes action imperative, there is 
still much to learn about the extent and 
immunological implications of HIV-1 se- 
quence diversity. We do, however, have in 
hand the fruits of an extensive global HIV 
sequencing effort [currently there are 
72,221 HIV sequences in the database (3)] 
that can provide a framework for reasoned 
vaccine strain selection. Optimizing selec- 
tion is of the utmost urgency, as a number 
of human vaccine trials are being planned 
and initiated (5, tf), and it is difficult to 
change strains during the long course of 
vaccine development, from initial concept 
to human trial. Subtype C is the most prev- 
alent HIV-1 subtype globally, and it pre- 
dominates in several geographic regions 
where vaccines might be evaluated. In 
these regions, the epidemiologically un- 
linked prevalence of subtype C infections 
can exceed 30% of the adult population (7). 
Therefore, this exploration of the implica- 
tions of HIV variation for vaccine strain 
selection focuses on subtype C; we believe, 
however, that our reasoning and findings 
can be extrapolated to other intra- and in- 
tersubtype scenarios. 

Although there is hope that a single vac- 
cine strain may elicit a sufficiently cross- 
reactive response to confer a benefit, there is 
great interest in attempting to optimize vac- 
cines through considerations of diversity. 
There are currently two general approaches to 
selecting vaccine strains that attempt to con^ 
tend with the high levels of HIV sequence 
variation (Table 1). The first is based on 
using isolates of a particular subtype, some- 
times selected from a geographic region 
where the vaccine is intended for use. Exam- 
ples of this approach that are under way 
include the development of several A- and 
C-subtype vaccines, as well as CRF01 vac- 
cine reagents (5, 6). This kind of approach 
can be integrated with biological consider- 
ations, such as coreceptor usage, neutraliza- 
tion susceptibility, neutralization potency of 
the serum from the individual from whom the 
isolate was obtained, or the preferential use of 
isolates from recent seroconvertors {8). The 
second approach, rather than using actual 
viruses from within the population, is to con- 
struct either a consensus sequence or an an- 
cestral sequence reconstructed on the basis of 
an evolutionary model. Such sequences have 
the advantage of being central and most sim- 
ilar to currently circulating strains of interest 
and may have enhanced potential for eliciting 
cross-reactive responses. They may also have 
economic and political advantages that merit 
consideration. Economic, because it is not 
feasible to duplicate vaccine design efforts 
using country-specific strains for every na- 
tion and region that needs a vaccine, and this 
is a way to limit the number of constructs that 



must be produced and tested in a way that is 
logical and scientifically defensible. Political, 
because such artificial sequences are not as- 
sociated with any specific country of origin, 
so nations hosting vaccine trials would not 
need to contend with the natural concerns that 
arise when asked to host a vaccine trial using 
HIV-1 antigens with distant geographic ori- 



gins. Some subtype C sequences of current 
interest are indicated in Fig. 1, a phylogenetic 
tree that includes geographic information to 
provide a basis for considering candidate vac- 
cine strains. (Phylogenetic terms and con- 
cepts used in this article are defined in the 
legend to Fig. 1.) These two basic approach- 
es, in different ways, directly confront the 
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Fig. 1. Maximum likelihood phylogenetic tree showing the genetic distances and relationships of 
potential vaccine strains to subtype C gag sequences, and to representatives from other subtypes. 
The external nodes, the branch tips on the right of the tree, each represent an actual sequence. The 
interior nodes, or branch points, are ancestral to the Made" of sequences that branch to their right 
This tree uses the M-group consensus as the outgroup, a sequence brought into the analysis to help 
determine the ancestral states and root of the "ingroup." in this case the HIV-1 M group. 
Constructing the tree with the M-group consensus sequence as an outgroup forces the ancestral 
node of the M group to be central to all of the subtypes; using a more conventional strategy of 
selecting another primate lentiviral sequence for an outgroup can lead to statistically unsupported 
and unrealistic locations for ancestral nodes (7). The horizontal branch lengths in the trees 
represent evolutionary distances and indicate how many nucleotide substitutions have occurred; 
these are estimated from the evolutionary model. The two-letter code for the country of origin of 
subtype C sequences is indicated: India, IN; South Africa. ZA; Botswana, BW; Tanzania, TZ; Israel, IL; 
Ethiopia, ET; Zambia, ZM; and Brazil, BR. The locations of the C-subtype and M-group consensus and 
ancestor are indicated. Potential C-subtype vaccine strains are marked by a bold branch and their 
isolate name; these include Brazilian BR025, one of the first available subtype C isolates; ZM651 
(AF286244), a Zambian strain; IN101 (AB023804), an Indian sequence discussed in the text, and 
sequences derived from recent South African isolates, including Du422 (AY043175), and three 
available for reagent development through the UNAIDS network, derived from viable isolates with 
full-length sequenced clones, ZA009 (AY1 18166), ZA003 (AY1 18165), and ZA012 (AF286227) (24, 
53). [Note: There are two South African samples designated ZA009, one from a B dade infection 
(AF095828), and one recently obtained UNAIDS Network C dade isolate; in this paper ZA009 refers 
to the C dade isolate.] The scale bar indicates the genetic distance along the branches, gag and env 
maximum likelihood trees with all sequences labeled are also available (32). 
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problem of viral diversity; we will explore 
their advantages and disadvantages in the 
sections that follow. 

Other concepts being explored to ad- 
dress diversity issues can ultimately be 
considered in the framework of the two 
basic approaches to strain selection de- 
scribed above. For example, multivalent 
cocktails of proteins that include a spec- 
trum of regional variants are being evalu- 
ated. Most of these strategies assume that 
the immune responses elicited by any one 
circulating strain will be of sufficient cross- 
reactivity to protect against other strains 
from the same subtype. Given that intra- 
subtype diversity in variable proteins can 
reach 20%, even this assumption may be 
too optimistic. Consensus sequences and 
strains from viable isolates could be com- 
bined in a polyvalent approach. A second 
strategy is to design modified envelopes to 
enhance exposure of epitopes known to be 
capable of inducing broadly neutralizing 
antibodies {9-11). There are a limited num- 
ber of monoclonal antibodies that have 
broad, cross-clade neutralization capabili- 
ties (11-13), and these antibodies can act 
synergistically (13-15). Vaccines specifi- 
cally designed to target these conserved 
epitopes, if successful, may ultimately be 
optimized by fine-tuning as subtype- specif- 
ic vaccines, as appropriate; although there 
is little evidence that subtypes correspond 
to neutralization phenotypes, in some cases 
particular clades can be refractive or less 
susceptible to particular antibodies. For exam- 
ple, the three broadly neutralizing monoclonal 
antibodies 2F5, 2G12, and IgGlbl2, raised 
against B clade strains, were not individually 
able to neutralize a C clade primary isolate, 
although they could in combination (14). Even 



if subtypes are not relevant to a particular neu- 
tralizing epitope, lineage-specific variation 
within the relevant antigenic domain may still 
be worth considering. A third strategy for con- 
tending with diversity is to use polyvalent pep- 
tides spanning a region like the V3 loop that 
induces strain-specific neutralizing antibodies 
(16), to attempt to elicit a set of responses that 
together confer cross-reactive protection (77, 
18). 

Isolate-Based Vaccines 

Geographic considerations. For historical rea- 
sons, AIDS vaccines reagents were first devel- 
oped from subtype B viruses, the dominant 
subtype in the United States and Europe. It has 
been proposed that such strains be included in 
vaccine trials conducted in populations infected 
with subtypes other than B. Cytotoxic T lym- 
phocyte (CTL) studies provide evidence for 
cross-subtype T cell responses, and B viruses 
have been studied for a longer time, so re- 
searchers can more rapidly move forward in 
safety and immunogenicity (phase I) studies (5, 
6). However, T cell immune responses in gen- 
eral are more intense and have greater breadth 
within a subtype (19-23). Thus, although there 
is potential for cross-reactivity and even for 
synergistic interactions between antibodies 
(75), it is more than likely that both the breadth 
and intensity of polyclonal T and B cell im- 
mune responses to cross-clade immunogens 
will be suboptimal and that important epitopes 
will be missed. Subtype-specific, single-strain, 
or combination vaccines have been strongly 
advocated in recent years (24-26), and approx- 
imately 10 subtype C vaccines are poised to 
enter phase I trials in India, South Africa, and 
China (27). 

There has been some discussion of choosing 
a regional strain for a vaccine, for example, an 



Indian strain to be used in India, and a South 
African strain in South Africa, and so on (8). 
There is little support for this in terms of se- 
quence analysis. Subtype C sequences from 
Botswana and South Africa intermingle (28), 
and there is no obvious choice of a single 
sequence most representative of the diversity in 
these regional samples (28); however, selecting 
a sequence with a short branch length relative to 
the common ancestor (29) in C clade might be 
advantageous, as it would tend to be most sim- 
ilar to the majority of contemporary sequences 
represented in the tree (30). Conversely, it 
would be sensible to avoid selecting outliers. 
Indian sequences tend to form a distinct sub- 
clade (37) within the C clade, indicating that 
most sampled Indian viruses are descended 
from a single founder strain. A small number of 
sequences from African nations are associated 
with sequences from India, however, and the 
sampling is extremely limited relative to the 
scale of the epidemic in both regions. Thus, 
there may be continuing movement of the virus 
between Africa and India. The Indian clade 
sequences tend to have short branch lengths 
relative to the root of the C clade. As a conse- 
quence, a strain like DM101 (accession number 
AB023804) from India is closer to most Afri- 
can subtype C strains than African strains are to 
each other (32), an interesting quirk that em- 
phasizes that it does not necessarily confer an 
advantage to select a strain from the country 
where a vaccine trial will be held. 

There are other subclades within the C 
subtype, besides the Indian subclade (33), 
that could be considered as vaccine candi- 
dates (Fig. I), but such subclades tend to have 
much shorter defining branch lengths than 
subtypes and, consequently, fewer distin- 
guishing amino acids, so the benefit of con- 
sidering them each separately for vaccines 
diminishes. On rare occasions, geographical- 
ly localized epidemics have been identified 
soon after the introduction of a founder virus, 
and prevalent viruses were highly related, as 
in Thailand (34) and Kaliningrad (35). It may 
ultimately be advantageous to develop vec- 
tors and strategies for rapid-response vaccine 
programs in such circumstances, when a 
highly similar virus is spreading explosively 
through a vulnerable population, but first, a 
working vaccine concept must be in place. 

Evolutionary evidence for subtype-spe- 
cific antigenicity in the envelope protein. 
Clearly, subtype-specific vaccines would 
increase the overall sequence similarity of 
the vaccine antigen relative to circulating 
viruses (Fig. 2), but this is only part of the 
story for antibody binding, because protein 
folding and exposure of antigenic domains 
are of great importance. To explore the 
hypothesis that there may be subtype-spe- 
cific patterns in the exposure of antigenic 
domains that are able to elicit antibody 
responses strong enough to drive escape 
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mutations, we compared estimates of 
codon-specific ratios of nonsynonymous to 
synonymous substitution rates (dN/dS) 
(36) in B clade and C clade envelope genes. 
(We selected B and C subtypes, as B clade 
vaccines are being considered for use in 
populations where the C subtype domi- 
nates.) High rates of diversifying selection 
were identified in different regions of the 
envelope protein (Env) in the two lineages, 
most strikingly in the Env V3 to C4 region. 
The V3 loop is less variable in the C 
subtype than in other subtypes (37), and as 
expected, the density of sites in the V3 
loop with dN/dS > 1 was higher in the B 
clade than in the C clade. This pattern was 
reversed, however, in the region just prox- 
imal to the V3 loop, where multiple sites 
show an excess of nonsynonymous 
substitutions in the C clade but not in the 
B clade (Fig. 3). To explore the consist- 
ency of these patterns within the C clade, 
three subclades, or phylogenetically asso- 
ciated groups of sequences within the C 
subtype, were examined independently. 
Two sets had 21 subtype C sequences, 
and the third had 18 sequences. The re- 
sults suggested sub- 
stantial intraclade 
coherence in how 
selection acts on in- 
dividual codons 
within the C sub- 
type; the 12 strongly 
selected sites in the 
region downstream 
of the V3 loop (Fig. 
3) had a dN/dS ra- 
tio > 1 in each of 
the three indepen- 
dent C-subtype data 
sets, and the tip of 
the V3 loop was rel- 
atively constrained 
with low dN/dS val- 
ues. Given that im- 
mune escape is like- 
ly to be a driving 
force of positive se- 
lection, immune 
pressure may be fo- 
cused on different 
regions of Env in 
the B subtype (the 
V3 loop) and C sub- 
type (the COOH- 
terminal region be- 
yond the V3 loop). 
If this interpretation 
of the observed dif- 
ferences in selection 
pressure in B and C 
subtypes is correct, 
there may be advan- 
tages in using a clade- 
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appropriate vaccine strain, as the immune re- 
sponse to the vaccine and the circulating virus 
would share antigenic domains. 

Artificial Sequences for Minimizing 
Diversity 

An effective way to minimize the degree of 
sequence dissimilarity between a vaccine 
strain and contemporary circulating viruses is 
to create artificial sequences that are "cen- 
tral" to these viruses. The simplest way to 
design such a sequence is to use a consensus 
sequence based on the most common amino 
acid in each position in an alignment (33, 38), 
Alternatively, a model of the most recent 
common ancestral sequence of an appropriate 
lineage can be reconstructed from a phyloge- 
netic tree, for example, by means of maxi- 
mum likelihood. The most likely sequence at 
any interior node in a tree can be derived 
from the sequences used to construct the tree, 
the evolutionary model used (how often one 
base is mutated to another, and the relative 
mutation rate at each site), and the branching 
pattern of the tree. Figure 1 illustrates where 
the C consensus and ancestral branch points 
are located in the tree. Both of these sequenc- 



es are more "central," i.e., they are closer to 
modern C-subtype sequences than modern 
sequences are to each other. As artificial se- 
quences, their construction depends on the 
sequences included in the analysis and so will 
change as the database expands. 

Envelope proteins are the most difficult 
HIV proteins to construct artificially, as both 
ancestral and consensus sequences contain hy- 
pervariable domains with multiple insertions 
and deletions (indels). Alignments are subjec- 
tive in such regions, and indels do not evolve 
according to the base substitution models cur- 
rently assumed in deriving a maximum likeli- 
hood tree. For constructing our consensus and 
ancestor sequences (3), hypervariable regions 
are aligned by anchoring on glycosylation sites, 
and only minimal common elements spanning 
the region are retained. As both consensus and 
ancestral sequences are derived and not actual 
sequences, expression, antigenicity, and biolog- 
ical activity require careful characterization be- 
fore use in a vaccine (39). 

Although artificial sequences may not 
have a proper protein conformation, and this 
may be critical for antibody responses, it is 
less important for designing T cell epitopes 
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Fig. 2. Scanning the HIV-1 genome and proteins to illustrate similarities between potential vaccine candidates and sequences 
from isolates. (A) and (B) compare 23 full-length subtype C sequences from South Africa, Botswana, and India with potential 
vaccine sequences. Green lines represent the comparison with the subtype C consensus sequence. The purple and blue lines 
show the comparison of the sequences of vaccine candidates BR025 and ZA003 (described in Fig. 1), respectively. The red 
lines show an interclade comparison of subtype C sequences with the B clade sequence JRCSF. (A) shows a nucleotide 
similarity plot, and (B) shows the corresponding amino acid similarity plot. 
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Fig. 3. The dN/dS ratio at each position in the V3 region, 
comparing a B-subtype and a C-subtype alignment. The dN/dS 
ratio was determined for each codon in an env alignment of C- 
and B-subtype sequences. The V3 region gave a particularly 
striking distinction between the two subtypes, illustrated here. 
The blue lines indicate the dN/dS ratio in the V3 loop of the B 
subtype, a region known to be a target of type-specific neu- 
tralizing antibodies. Four codons on either side of the tip of the 
V3 loop have dN/dS ratios over 5, indicative of very strong 
positive selection. The red lines indicate the dN/dS ratio of the 
C subtype, and there is no strong pressure for change near the 
tip of the V3 loop. In contrast, downstream of the V3 loop 
there are 12 codons that exhibit high dN/dS ratios (>4) in the 
C subtype and only three in the B subtype. This suggests 
different regional evolutionary pressures in the two subtypes, 
and possibly distinct regions of antigenic exposure in these 
regions in the B and C lineages. 



or peptide reagents for testing T cell 
responses. Consensus sequences may 
be ideal for peptides used to explore 
the T cell immune response, as it 
would probably improve recognition 
compared with any single reference 
strain, and using sets of autologous 
strain peptides can be prohibitively 
expensive. A consensus may even be 
preferable to autologous peptides, as 
CTL escape mutations can rapidly pre- 
dominate in the viral quasispecies, and 
important early responses (40) may go 
undetected through the use of peptides 
based on isolates from later time points 
that have escaped the early responses. 
Consensus peptides for several sub- 
types are available (41). 

A similarity plot maps the percent 
similarity of a query sequence relative 
to a test set in a window spanning a 
region of a specified size that is moved 
progressively along an alignment. In 
Fig. 2, prototype vaccine reagents (a C 
consensus, two subtype C vaccine 
strains, and a subtype B isolate) are 
used as query sequences and compared 
with 23 subtype C sequences from 
South Africa, India, and Botswana. In every 
gene region, the same relative pattern holds. 
The spectrum of similarity scores for the C 
consensus sequence compared with the set of 
23 C sequences is 5 to 1 5% greater than when 
any one C isolate is compared with others in 
the set (28). In turn, subtype C proteins are 5 
to 15% more similar to the subtype C se- 
quences than are subtype B sequences. This 
implies that using a B clade virus as the basis 
of a vaccine in a C clade- dominated epidem- 
ic may be less effective than using a C clade 
virus, and a C clade virus may not be as 
effective as a C consensus. Conserved pro- 
teins from different subtypes can be more 
closely related than variable proteins from the 
same subtype, and this fact might be exploit- 
ed by using a single vaccine strain for con- 
served proteins and multiple clade-speciflc 
strains for variable proteins. 

We have been discussing pooling sequences 
within a subtype to generate artificial central 
sequences, but it is also possible to pool the 
subtypes themselves. To maximize potential 



cross-reactivity, we have created sequences 
central to the M group, the diverse viruses that 
have contributed most to the global epidemic. 
The set of subtype consensus sequences was 
used to build an M-group consensus, thus 
weighting the subtypes equally. The M-group 
consensus and the most recent common ances- 
tor can be very nearly identical (7, 28). Because 
of the nature of the HIV-1 M-group phylogeny, 
the average distance from HIV-1 sequences to 
the M-group consensus is similar to intrasub- 
type sequence distances between contemporary 
isolates (32), roughly half that of intersubtype 
distances (Table 2). In the Democratic Repub- 
lic of the Congo (DRC) (42, 43), so many 
subtypes and recombinants circulate together 
that the extent of the regional diversity resem- 
bles the global diversity. In this setting, an 
M-group consensus may be helpful, or a poly- 
valent approach including representative strains 
from common subtypes along with the 
M-group consensus. Even in a design focusing 
on epitopes that are conserved across clades, an 
M-group consensus might be the optimal base- 



line sequence. Consensus and ancestral 
sequences for the major HIV-1 subtypes, 
CRFs, and the M group are available (3) 
and will be updated as sequences accrue. 
Intersubtype similarity comparisons 
with an M-group consensus are included 
in the supplementary material (57). 

Consensus and ancestral sequences 
conserve CTL epitopes. Experimental- 
ly defined CTL epitopes in the HIV 
Immunology Database (3) cluster 
more densely in conserved regions of 
HIV proteins (44). The peptides span- 
ning variable regions used to detect 
CTL responses can be quite different 
from the infecting strain that elicited 
the response, no doubt contributing to 
the paucity of defined epitopes in vari- 
able domains, but, in addition, an en- 
richment of features that could contrib- 
ute to CTL escape can be discerned in 
variable domains (45). Either way, re- 
gions where defined epitopes are con- 
centrated are likely to be key for cross- 
reactive CTL responses (44). The 
epitopes in the database have primarily 
been defined for B clade responses; 
however, the C clade peptides that 
trigger immunodominant responses tend to be 
localized in these same regions (44, 46). 
Thus, in contrast to Fig. 2 and Table 2, where 
whole proteins were analyzed, we focused on 
protein regions where CTL epitopes have 
been found in order to create Fig. 4, which 
shows the average sequence distances from 
potential vaccine strains to immunogenic re- 
gions in subtype C proteins, by country of 
origin. Three proteins were selected, repre- 
senting the spectrum of variability: highly 
conserved p24, variable pi 7, and highly vari- 
able envelope (subunit gpl60). In the immu- 
nogenic regions analyzed, C-subtype consen- 
sus and ancestral sequences had the fewest 
amino acid changes relative to contemporary 
C-subtype protein sequences. Within-subtype 
comparisons of single C-subtype viral strains 
and the M-group consensus sequences gave 
comparable numbers of amino acids changes, 
roughly half the number of changes relative 
to B subtype interclade comparisons. 

Consensus and ancestral sequences con- 
serve predicted immunoproteasome cleavage 



Table 2. Median and range of percent similarity scores between potential 
.vaccine candidates and an alignment of C clade sequences/The similarities 
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sites. For viral proteins to be recognized by 
CTL they must be processed, and each step of 
epitope processing has potential constraints 
imposed by sequence specificity. Immune es- 
cape due to mutations in epitope flanking 
regions demonstrates escape from immune 
suppression through cleavage abrogation 
(47) and shows that epitope processing is 
sensitive to the surrounding sequence, al- 
though a simple cleavage signal is not readily 
discemable. If the tendency to be cleaved at a 
relevant site is markedly different in a vac- 
cine strain and a challenge strain, the immu- 
nological priming induced by the vaccine will 
be ineffective. This problem is difficult to 
resolve experimentally, so we addressed it 
computationally by means of NetChop (48), a 
neural net prediction program for immuno-. 
proteasome cleavage (32, 45). 

The median cleavage prediction scores for 
subtypes B and C were correlated, but although 
many sites preserved their relative tendency to 
be cleaved, there were many exceptions, posi- 
tions with high cleavage prediction scores in 
subtype B but not in C, or vice versa (32). This 
suggests that the predilection for cleavage of 
many sites would be altered in the two subtypes, 
which could result in diminished breadth of 
cross-reactive responses. C clade sequences and 
the M-group consensus gave cleavage predic- 
tion patterns that were similar when compared 
with the median scores for the C clade align- 
ment, and they performed better than sequences 
from the B clade (32). Scores predicted for the 
C-subtype consensus cleavage correlated most 
strongly with the median scores for the subtype 
C population (32), suggesting that it would be 
processed at any given position similarly to 
most of the subtype C strains, and so it may 
have the greatest potential for eliciting cross- 
reactive immune responses at the population 
level. The complete analysis is provided in the 
supplemental information (32), but in summary, 
the linear correlation coefficients (r 2 values) for 
comparisons of the median C clade cleavage 
scores to vaccine candidate strains are as fol- 
lows for positions in the Envelope protein: B 
clade, 0.65; the M-group consensus, 0.79; the 
M-group ancestor, 0.80; specific sequences 
from subtype C isolates, 0.79 to 0.81; the sub- 
type C ancestor, 0.88; and the subtype C con- 
sensus, 0.92. 

Consensus or reconstructed ancestor? One 
might assume that an ancestral sequence would 
resemble more closely a real viral protein than 
a consensus. It is statistically extremely unlike- 
ly, however, that an ancestor corresponds to an 
ancestral sequence of a clade as complex and 
diverse as an HIV-1 subtype. Furthermore, re- 
construction greatly depends on assumptions 
inherent in building maximum likelihood trees; 
for example, if positions are not evolving inde- 
pendently or there are undetected recombina- 
tion events, the ancestral reconstruction would 
be influenced and incorrect. Thus, it is highly 



improbable that an ancestor of a subtype ever 
existed precisely as reconstructed. Ancestor and 
consensus sequences are subject to different 
sampling biases and will change from year to 
year as sequences accrue. An ancestor is influ- 
enced by sequences external to the subtype of 
interest and will tend to be slightly more distant 
from available sequences within a subtype than 
a consensus sequence (Table 2), as well as 
slightly closer to sequences of other subtypes. 
The inclusion of a new outlier that branches 
near the basal node of a subtype could have a 
strong influence on the ancestral node (see, for 
example, Fig. 1, where the ancestral and con- 



based on contemporary isolates may be more 
likely to reflect escape variants relevant to the 
host population than an ancestral sequence. 
For example, a CTL escape mutant in an 
epitope presented by a human leukocyte an- 
tigen molecule common in a certain popula- 
tion may be selected for and may be more 
likely to be represented in the consensus se- 
quence than a reconstructed ancestor se- 
quence. If most viruses in the circulating 
population had already lost the original 
epitope because of immune escape, and if the 
epitope elicited a dominant response upon 
vaccination with a strain that carried it, then 
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Fig. 4. Amino acid percent differences between 
vaccine strain sequences and C-subtype se- 
quences in CTL epitopes. This analysis was lim- 
ited to protein subregions known to be immu- 
nogenic by requiring overlap with at least one 
well-characterized CTL epitope from the HIV 
immunology database (3). The consensus, ances- 
tral, and vaccine sequences were compared with 
all subtype C sequences, and subtype C sequenc- 
es broken down by country of origin for Botswa- 
na, India, and South Africa. The median differ- 
ence between the query and the C-subtype se- 
quence set is shown. Seventy-nine subtype C 
sequences were used for the p24 comparison, 79 
for p17, and 97 for gp160. The range of differ- 
ences is indicated only for the South African set, 
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sensus sequences are separated mainly because 
of one single outlier virus from South Africa), 
but as a single sequence it would have little 
bearing on the consensus. In contrast, a consen- 
sus will be influenced by the sampling of se- 
quences from within subclades. For example, if 
many sequences were obtained from the Indian 
subclade during the next year, the next C con- 
sensus would be more like the Indian subset, 
but the shift in sampling would have less impact 
on the subtype C ancestor, unless the new se- 
quences substantially altered the evolutionary 
model. 

It is possible that a consensus sequence 



the consensus sequence would have an ad- 
vantage. On the other hand, if the wild-type 
form of the epitope was still circulating, even 
infrequently, and if the epitope was particu- 
larly potent, there might be an advantage in 
using the ancestor. In the end, both concepts 
need to be tested experimentally, both in 
terms of B cell and T cell responses. 

Applying Evolutionary Principles to 
Vaccine Strain Selection 

How can HIV's evolutionary trajectory be 
incorporated into a sensible vaccine ap- 
proach? Although subtypes of HIV-1 are 
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phylogenetically defined on the basis of ge- 
netic and evolutionary distances, the practical 
consequence of phylogenetic clustering of 
viruses is patterns of shared amino acids that 
can influence the immunological cross-reac- 
tivity of vaccine-stimulated immune respons- 
es. Env proteins from different clades can 
differ in more than 30% of their amino acids, 
and HIV-1 continues to diversify. Neutraliz- 
ing antibody as well as CTL escape occurs in 
vivo (49, 50), escape mutations can be trans- 
mitted and stable (49), and there are protein 
regions under clear positive selection pres- 
sure (36) (Fig. 3). These observations indi- 
cate that HIV-1 amino acid variation is im- 
munologically relevant. The impact of that 
variation on vaccine-conferred immune pro- 
tection will ultimately have to be assessed 
through vaccine trials, but the differences 
between potential vaccines and circulating 
strains can be minimized when designing trial 
reagents to attempt to enhance cross-reactive 
responses. 

Most vaccines are intended to elicit poly- 
clonal responses to multiple epitopes, so even 
if they differ in some antigenic domains from 
a given virus, in others they may be cross- 
reactive. Selecting a clade-appropriate vac- 
cine for a regional trial would tend to increase 
the number of potentially cross-reactive 
epitopes by increasing the level of similarity 
between the vaccine and the population, and 
the use of consensus and ancestors would 
enhance the cross-reactive potential. The dif- 
ference in selection pressure on B and C 
clade envelopes is indicative of lineage-spe- 
cific antigenicity, further supporting the use 
of subtype-appropriate vaccines to maximize 
the probability that the vaccine elicits im- 
mune responses to domains that are antigenic 
in the circulating viruses. 

We could see no compelling advantage 
in further subdividing the C clade by coun- 
try of origin, although this is often a 
consideration for vaccine design (8). Our 
analysis supports the recommendations of 
the international meeting on candidate vac- 
cines for the developing world sponsored 
by . the Vaccine Research Center of the 
United States, National Institute of Allergy 
and Infectious Diseases (57), indicating 
that, although there may be advantages to a 
subtype-specific vaccine, a promising 
subtype-specific vaccine candidate could 
be used in many different geographic loca- 
tions without compromising the potential 
for success. This does not mean that there 
would never be an advantage in tailor- 
ing a vaccine further by selecting a se- 
quence from an interior subclade within a 
subtype. For example, there might be an 
advantage in using an Asian, not African, 
CRF01 in Thailand, or an Indian C clade 



sequence in India. But within-clade differ- 
ences tend to be subtle and represent far 
fewer amino acid changes than between- 
subtype differences. 

In regions where an epidemic is dominat- 
ed either by a particular subtype or CRF, it 
makes sense to use that dominant lineage for 
a vaccine and to consider the use of a con- 
sensus or ancestor. Although we cannot know 
if even the use of central sequences will be 
enough to contend with HIV diversity, this 
kind of strategy can potentially enhance the 
cross-reactivity and breadth of a vaccine re- 
sponse relative to any single strain. In regions 
where two or three subtypes and multiple 
recombinants are cocirculating, to include 
each of the prevalent subtypes could improve 
the potential coverage not only of those sub- 
types, but of the variety of recombinant forms 
that stem from them (52). Finally, nations 
with very diverse viral populations, like the 
DRC, might be best served by developing 
polyvalent vaccines including a spectrum of 
natural forms combined with an M-group 
consensus. An M-group consensus or ances- 
tor is central not only to the major subtypes, 
but to recombinant forms involving the sub- 
types. Even if a single subtype predominates 
in a country, combining an M-group consen- 
sus with a regionally dominant subtype might 
be advantageous in an urban context where 
people of many nationalities mingle. 
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